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J  This  paper  deals  with  the  problem  of  selecting  good  binomial  populations 
compared  with  a  standard  or  a  control  through  the  empirical  Bayes  approach. 
Two  cases  have  been  studied:  one  with  the  prior  distribution  completely 
unknown  and  the  other  with  the  prior  distribution  symmetrical  about 
p  =  jjf,  but  otherwise  unknown.  In  each  case,  empirical  Bayes  rules 
are  derived  and  their  rates  of  convergence  are  shown  to  be  of  order 
Q(exp(-cn))  for  some  c>0,  where  n  is  the  number  of  accumulated  post 
experiences  at  hand.  .  s ^ 
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Empirical  Bayes  Rules  for  Selecting 
Good  Binomial  Populations^ 
by 
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1 .  Introduction 

The  empirical  Bayes  approach  in  statistical  decision  theory  is  appropriate 
when  one  is  confronted  repeatedly  and  independently  with  the  same  decision 
problem.  In  such  instances,  it  is  reasonable  to  formulate  the  component 
problem  in  the  sequence  as  a  Bayes  decision  problem  with  respect  to  an  unknown 
prior  distribution  on  the  parameter  space  and  then  use  the  accumulated 
observations  to  improve  the  decision  rule  at  each  stage.  This  approach  is 
due  to  Robbins  (1956,  1964,  1983).  Many  such  empirical  Bayes  rules  have  been 
shown  to  be  asymptotically  optimal  in  the  sense  that  the  risk  for  the  nth 
decision  problem  converges  to  the  optimal  Bayes  risk  which  would  have  been 
obtained  if  the  prior  distribution  was  known  and  the  Bayes  rule  with  respect 
to  this  prior  distribution  was  used. 

Empirical  Bayes  rules  have  been  derived  for  multiple  decision  problems 
by  Deely  (1965)  for  selecting  a  subset  containing  the  best  population. 

Van  Ryzin  (1970),  Huang  (1975),  Van  Ryzin  and  Susarla  (1977)  and  Singh  (1977) 
also  studied  other  multiple  decision  problems  by  using  the  empirical  Bayes 
approach.  Recently,  Gupta  and  Hsiao  (1983)  and  Gupta  and  Leu  (1983)  studied 
empirical  Bayes  rules  for  selecting  good  populations  with  respect  to  a 
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standard  or  a  control  with  the  underlying  populations  being  uniformly 
distributed. 

In  this  paper,  we  are  concerned  with  the  problem  of  selecting  good 
binomial  populations  with  respect  to  a  control  through  the  empirical  Bayes 
approach.  Two  cases  have  been  studied:  one  with  the  prior  distribution 
completely  unknown  and  the  other  with  the  prior  distribution  symmetrical 
about  p  =  but  otherwise  unknown.  In  each  case,  empirical  Bayes  rules 
are  derived  and  their  rates  of  convergence  are  shown  to  be  of  order 
0(exp(-c1-n))  for  some  c^  >  0,  i  =  1,2.  For  the  case  of  the  symmetrical 
prior  distribution  two  smoothing  methods  are  studied  in  order  to  improve 
the  performance  of  the  sequence  of  empirical  Bayes  rules. 

/  •  '  ormulation  of  the  Empirical  Bayes  Approach 

Let  ttq,  tti  , . . .  ,-rrk  denote  k  +  1  populations  and  let  be  a  random 
observation  from  tt.  .  Assume  that  X.  n,  B(N -  .p^ ) ,  where  p^  €  (0,1)  and  Ni  is 
fixed  and  known.  Let  ttq  be  the  control  population.  For  each  i  =  l,...,k, 
population  is  said  to  be  good  if  p..  >  pQ  and  bad  if  pi  <  pQ,  where 
the  control  parameter  Pq  is  either  known  or  unknown.  Our  goal  is  to  derive 
some  empirical  Bayes  rules  to  select  all  the  good  populations  and  exclude 
all  the  bad  populations. 

When  the  control  parameter  Pg  is  known,  the  empirical  Bayes  framework 
can  be  formulated  as  follows: 

(1)  Let  n  =  {jd|£  =  (p.| ,. . .  ,p^) ,  p..  e(0,l)  for  i  =  l,2,...,k}.  For  each 

define  A(j>)  =  { i | p.  >_  pQ},  B(j^)  =  {i|p.  <  pQ}.  That  is,  A(^) (B(^)) 
is  the  set  of  indices  of  good  (bad)  populations. 

(2)  Let  A  =  { a | a  c { 1 ,2,. . . ,k}}  be  the  action  space.  When  action  a  is 
taken,  it  means  that  population  n.  is  selected  as  a  good  population  if 
it  a,  and  excluded  as  a  bad  population  if  i^a. 


-.-'I 


A  -•  .1 


•V. 


(3)  The  loss  function  L(^>,a)  is  defined  as  follows: 


(2.1)  L(jg.a)  *  I  (PrP0)  +  .  I  (P0-Pf) 

i€A(^)-a  iea-A(p) 


A^) 


where  the  first  summation  is  the  loss  due  to  not  selecting  some  good 
populations  and  the  second  summation  is  the  loss  due  to  selecting  some 
bad  populations. 

k 

(4)  Let  dG(p)  =  n  dG.(p.)  be  the  prior  distribution  over  the  parameter 

*  i=l  1  1 

space  n,  where  G.(»)  are  unknown  for  all  i  =  l,2,...,k. 

(5)  For  each  i,  let  (X...P. .),  j  =  1,2,...,  be  pairs  of  random  variables 

'  J  •  J 

associated  with  population  it.,  where  X..  is  observable  but  P..  is  not 

1  1  vj  '  J 

observable.  P-.  has  distribution  G, .  Conditional  on  P..  =  p..,  X .  . j p  .. 

■  o  ’  ^  J  i  ^  d  *  J 

is  bi normally  distributed  with  parameters  N.  and  pfi.  For  the  case  where 
the  prior  distributions  G^s  are  completely  unknown,  some  additional 
observations  Y..  =  (Y ,Y, Hn  )  from  each  population  n.,  i  =  l,2,...,k, 

M  J  1  J  I  I  Jn  j  1 

are  assumed  to  be  at  hand,  where  m  =  are  i-i-^., 

independent  of  X ^ j | P ^ j  and  follow  B(1 *P^j )  distribution.  Thus,  in  this  case 
the  j th  stage  observations  are  =  ( (X1 j , Y 1  ^) , . . . , (X^j  ,  Y^)).  For  the 
second  case  where  G^s  are  assumed  to  be  symmetric  about  V  =  ]>,  no 
additional  data  are  needed  for  the  construction  of  our  empirical  Bayes  rule. 

u  Let  X  =  (X,,...,X.  )  be  the  present  observation.  Conditional  on 

k 

p  =  (Pp...,pk),  \  has  joint  probability  function  f(£;jj)  -  u  f .  ^  jp. , 

N.  N.-x 

wh»re  f.'xjp)  =  px(l-p)  1  for  each  i  =  l,...,k. 


Finally,  since  we  are  interested  in  Bayes  rule,  we  can  restrict  our 


attention  to  the  nonrandomi zed  rules. 


[!)  Let  D  *  id| d  :  x  -*■  A,  being  measurable)  be  the  set  of  nonrandomized 


k 

rules,  where  x  =  n  {0,1 For  each  deD,  let  r(G,d)  denote  the 
i=l  1 

associated  Bayes  risk.  Then,  r(G)  =  inf  r(G,d)  is  the  minimum  Bayes  risk 

deD 

When  the  control  parameter  p^  is  unknown,  for  the  related  framework, 
the  indices  in  the  associated  notations  should  begin  at  0  instead  of  at  1. 

In  the  sequel,  (0)  will  be  used  to  show  this  additional  fact. 

We  now  consider  empirical  Bayes  decision  rule  d  (x,  7.,...,Z  )  whose 
form  depends  on  x  and  Z. ,  j  =  1 , . . . ,n.  Let  r(G,dn)  be  the  Bayes  risk 
associated  with  decision  rule  dn(;x,  j^).  That  is, 

r(G,dn)  =  l  E  /  L($,d  Qt,  j£r ....£„))  f($|jg)  dG(g) 

frx  n 

where  the  expectation  E  is  taken  with  respect  to  (j^.....^  ).  For  simplicity, 

dn(*>  wil1  be  denoted  by  dn($)- 

Definition  2.1.  A  sequence  of  decision  rules  (d  (x))°°  .  is  said  to  be 
-  n  n  'v  n=  I 

asymptotically  optimal  (a.o.)  relative  to  the  prior  distribution  G  if 
r(G,dn)  ■>  r(G)  as  n  +  ®. 

For  constructing  a  sequence  of  a.o.  rules,  we  first  need  to  find  the 
minimum  Bayes  risk  and  the  associated  Bayes  rule,  say  dg.  From  (2.1),  the 
Bayes  risk  associated  with  decision  rule  d  is 

k 

r(G,<i)  =  l  l  AiS($)  n  MxJ  +  C, 


(2.2) 


(2.3) 


pO*Vxi^  ‘  Wj(xj)  if  Pq  is  known; 

W0^x0^i^xi^  ‘  Wi^xi^0^x0^  ^  p0  1S  unknown; 


1 

V*)  =  /  ^(xlp)  d6i(p); 
1 

W^x)  =  j  pf^  (x J p) dG -  (p); 
and 


C 


X  \h 


Hence,  the  Bayes  rule  dg  can  be  obtained  as  follows: 


(2-4)  dG(^  =  {i!AiG(^  -  0K 


Now,  for  each  i  =  (0) ,  1 

W.  (x.)  :  W.  (x.;  (X.,,Y.,),. 
in  r  mv  i  v  il  \il'  ’ 

f .  (x. )  ^  f .  (x. ;  (X. , ,Y 
in  i  in  l  v  l 1  %il ' ’ 

Define 


.,k,  and  for  each  n  =  1,2,,..,  let 

(X.  ,Y.  ))  be  an  estimator  of  W.(x.) 
in  ^in  il 

( X i n , n ) )  be  an  estimator  of  f.  (x,.) 


(2.5) 


W0n(x0)fin{xi)_  Win(xi)f0n(x0}  if  p0  is  unknown 
p0fin^xi^  "  win^xi J  if  Pq  is  known> 


and 
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If  w-jn(x)  ^  W.(x)  and  fin(x)  £  f^(x)  for  all  x  =  0,1,. ...N.  where 
"  £  "  means  convergence  in  probability,  then  A1n(£)  t  A.g(£)  for  all  xeX- 
Therefore,  from  Corollary  2  of  Robbins  (1964),  it  follows  that  r(G,dn)  r(G) 
as  n  +  ».  So,  the  sequence  of  decision  rules  (dn(£)}  defined  in  (2.6)  is 
asymptotically  optimal  for  our  selection  problem.  Hence,  in  the  following, 
we  have  only  to  find  sequences  of  estimators  {W.-n(x)}  and  {f n ( x ) >  possessing 
the  above  mentioned  convergence  property. 

3.  Case  when  the  Prior  Distribution  is  Completely  Unknown 

Robbins  (1964)  and  Samuel  (1963),  respectively,  pointed  out  that  there 
was  no  way  of  approximating  W^(x)  just  by  using  the  observations  (X^  , . . . ,X -n) . 
In  order  to  remedy  this  deficiency,  we  take,  at  each  stage,  some  more  observa¬ 
tions  (Yjj-|  •  »,Y^ jn  )  in  our  model  where  ni  can  be  any  positive  integer. 

For  simplicity,  let  n.  =  1  for  all  i  =  (0),l,...,k. 

Estimation  of  W,.  (x)  and  (x) 

A  usual  estimator  of  f. (x)  can  be  given  as  follows: 

7  n 

(3.D  fin(x)  n  I(x}  (Xij}  for  x  =  0,l,...,Nr 

J  I 

Then  f ^ n( x)  is  an  unbiased  estimator  of  f..(x),  and  by  the  strong  law  of 
large  r.umbers,  fin(x)  f.j(x)  with  probability  1  for  each  x  =  0,1, ...,N^. 

Hence,  f-n(x)  ^  f^(x)  for  all  x  =  0,1,...,N. . 

For  the  estimation  of  (x),  we  consider  the  following.  Define 


<3-2>  Vx)  *  vu  WV- 


Under  the  assumption  (5)  of  Section  2,  it  is  easy  to  see  that  E[V-,(x)]  = 

*  J 

W . (x) .  We  then  define 


=  Vx)- 


(3.3) 


Since  V^U),  i  -  1,2,...,  are  i.i.d.  and  bounded,  it  is  easy  to  show  that 

Win(x)_+Wi(x)  with  Probability  one  for  all  x  =  0,1,..., N . .  Now,  let  A .  (x) 

l  in  'Xj 

and  dn(x)  be  defined  as  in  (2.5)  and  (2.6),  respectively.  From  the  discussion 
of  Section  2  and  the  construction  of  the  sequence  of  decision  rules 
{dn}n=i  through  (2.5),  (2.6),  (3.1)  and  (3.3),  we  get  the  following  result. 
Theorem  3.1 .  For  our  decision  problem,  the  sequence  of  decision  rules 
idn}“=1  is  asymptotically  optimal  relative  to  the  prior  distribution  G. 

Rate  of  Convergence  of  Empirical  Bayes  Rules  {d  } 


Let  id  }  i  be  a  sequence  of  empirical  Bayes  rules  relative  to  the 
prior  distribution  G.  Since  the  Bayes  rule  d^  achieves  the  minimum  Bayes 
risk  r(G)  relative  to  G,  r(G,dn)  -  r(G)  >_  0  for  all  n  =  1 ,2,. . .  .  Thus,  the 
nonnegative  difference  r(G,dn)  -  r(G)  is  used  as  a  measure  of  the  optimality 
of  the  sequence  of  empirical  Bayes  rules  (d 

Definition  3.1.  The  sequence  of  empirical  Bayes  rules  (d n } ”_ -j  is  said  to 
be  asymptotical ly  optimal  at  least  of  order  an  relative  to  G  if 

r(G,dn)  -  r(G)  <_  0(an)  as  n  -*•  «°  where  lim  «n  =  0. 

n-«° 

For  each  i  =  l,...,k,  define  S^  =  ^£x|ai-q(x)  <  0),  T^  =  {^exl^-g^)  >  0}. 

Let  f  ^  =  min  (_Ajg(^) )  >  =  min  (AjQ^x^  anc*  c  =  mi n(ci  *  Since 

xcS .  xeT. 

%  i  ^  i 

l<ifk  1  <i  <k 

r  is  a  finite  space,  therefore  c  >  0.  Now,  by  the  fact  that  0  <  f.(x-)  5  1  and 

J  J 

« AfG (-^)  I  -  1  ’  with  straightforward  calculations,  one  can  obtain 
0  r(G,dn)  -  r(G) 


1 1^1  jjs,  PU("<4)  ’  01  +  iTj  Pl*<»(«>  1  0)(  • 


'  ro-'  1,3.4),  it  suffices  to  consider  the  behavior  of 

“  i nwO  0:  when  g- S.  and  that  of  PiA.  (x)  ;■  0;  when  x  T.  as  n  >  »  for 


***— 


Note  that  for  each  ^  €  S. , 

P<ain<*>  *  01  *  -  NS(«)  >  -  a,6(«)> 

i  PUir<«>  -  alG(«>  *  El- 

Then,  by  (2.3),  (2.5)  and  the  fact  that  0  £  W^x.),  f.(x.),  Win(xi),  f  (x^  _<  1 
and  Pg  €  (0,1),  one  can  obtain  the  following  inequalities: 


(3.5) 


(3.6) 


P,Ain(«>  >  01  i  p<f1n<x,)  -  q(xt)  >  f)+  PO^tx,)  -  W.(x.)  <  -  |) 

when  Pg  is  known;  and 

P{Ain(«)  >  01  i  P{W0n(x0}  ‘  W  >  t}  +  p<f<nW  -  f«(xj  - 


in'  i  i '  i 


+  P{“in<xi>  -  <  -  f>  +  plf0n(x0>  -  W  <  - 


when  pg  is  unknown. 


(3.5)  and  (3.6)  show  that  it  suffices  to  consider  the  behavior  of 
P{  lfin(xi)  "  f i ( xi )  1  >  6*  and  P<|Win(xi)  -  W.(x.)|  >  6}  for  some  o  >  0. 


From 


(3.2)  and  (3.3),  Win(x)-W. (x)  =  l  A.j(x)/n  where  A^(x)  = 


j=l 


ij 


YijI(x}^Xij^'Wi^x^’  It:  is  eaSy  t0  See  that  Aij^’  ^  =  are 

i.i.d.  with  mean  0  and  finite  variance,  say  B.U),  since  [  A  -  -  ( x )  |  £  1. 

I  I J 

Therefore,  for  m  •»  2, 


E[A™j(x)]  <  E[ | A^ j ( x ) |m]  <  E[|A.J(x) |2]  =  S.(x)  <  |  B^xJmL 

Let  B^x)  =  ni.^(x).  Thus,  by  Bernstein's  inequality  (see  Ibragimov 
and  Linnik  (1971),  page  169),  for  any  --  0, 


av; 


-  ^  * 


1 

•3 
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■  Win(x)  ‘  W!U- 


Pi  |  l  A .  - ( x )  ]  >  2BMx)  min(i  n*  (x),  ’  n* p,f  ( x) ) } 

j_*J  *  J  1 1  ■  * 

2  expi-  ~j-  min(.'i2i',“1(x),  / ^  ( x ) ) ) 


Similarly,  from  (3.1),  f.  (x)  -  f.(x)  =  )  C--(x)/n  where  C--(x)  = 

i n  i 

I  (X..)  -  f.(x).  Also,  C--(x),  j  =  l,...,n,  are  i.i.d.  with  mean  0  and 

(  A  ;l  I  J  1  1  J 

j  C.j  •  ( x )  |  ■  1  and  hence  with  finite  variance,  say  u.(x).  Applying  Bernstein's 
inequality  again,  we  obtain 


-  V*>: 


,  n  .  ,  2  -1 t  <  ,  , ■ 

■  ■  2  exp  -  .  mini,  •  , .  \k,  ,  .  ;x  j 

4  »  1 


'nos,  we  take 


4  if  p0  is  unknown  or  take  -  -  ^  if  pQ  is  known.  Then, 


from  (3.5)  through  (3.8),  for  each  xeS., 

^  1 


p'  Ain(x)  >  0}  <  0(exp{-  ^  min(o2«T1  (Xi ) ,  a^x^)}) 


+  0(exp{ -  £  min(,-21;T1(x.), 

Following  an  argument  analogous  to  the  above,  we  also  get  the  conclusion 


given  below: 


For  each  x..T\  ,  i  =  1 , . . .  ,k , 


P!/.in(x)  -  0}  <  0(exp{-  ^  min(-2^1(x.) ,  ^(x^)}) 


(3.10) 


+  0(exp- -  ^  min( • 2.  ‘ 1  (x. )  ,  i  (x^ ))  ) . 


J 

-  .  •  V  *  I 

L.«J| 


Now,  let  c,  =  Tmin(b,,bJ  where  b,  =  min  \  min  (6  aT'(x),  a.(x))  , 

mill*  |_°ix<Ni  1  1  J 

b  =  min  |  min  (62e^(x),  6 ^  ( x ) )~] ,  here  m  =  1  if  pft  is  known  and 

1  m<i<k  [_0<x<N.  J  0 

m  =  0  if  pQ  is  unknown.  It  is  clear  that  c1  >  0  since  B^x)  >  0, 
a  .j  ( x )  >  0  and  x  is  finite.  Thus,  we  have  the  following  theorem: 

Theorem  3.2.  Let  fdnl“_^  be  the  sequence  of  asymptotically  optimal  rules 
‘(escribed  in  Theorem  3.1.  Then,  r(G,dn)-r(G)  <_  0(expL -c-j n })  for  some  c,  0 


An  Alternative  Empirical  Bayes  Rule 

With  the  same  framework  as  above,  define 


(3.11) 


1J 


Xij 


V 


Then,  T^lp^  *  B(Ni+l,  p.,.).  With  fi (x j p) 
(2.3), 


N.  N.-x 

(X1)PX(1-P)  1  »  writing  from 


1 

fjW  =  /f  i  (x  [  p)dGi  (p)  =  fi(x,Ni). 
Then,  from  (2.3),  following  Robbins  (1956),  we  see  that 


W^x) 


x+1 

N.+l 


f^x+l,  N.+l). 


Hence,  let 


(3.12) 


«?„<“)■  N^T  j,  '(x,n(Tu>- 


and  define 


p0fin(x1)_Hin(xi)  if  p0  is  known’ 

W0n{x0)fin(xi)-Win(xi)f0n(x0)  if  p0  is  unknown; 


(3.13) 


Note  that  W^n(x)  is  also  an  unbiased  consistent  estimator  of  W..(x). 
Therefore,  following  an  argument  analogous  to  that  of  (3.7),  we  can  conclude 
that r(G,d^)-r(G)  <  0(exp(-c2n))  for  some  c2  >  0. 

4 .  C a s e  when  (.)  are  Symmetrical  about  p  =  1/2 

In  this  section,  we  suppose  that  there  is  sufficient  information  to 
tell  us  that  ( • )  are  symmetrical  about  p  =  1/2  for  all  i  =  (0),  l,...,k. 
Further,  we  also  assume  that  Ni  are  even  integers  for  all  i  =  (0),  l,...,k. 

E  s  t i ma  t i on  ojf__W . (x)  and  f.(x) 

Under  the  above  assumptions,  f.. (x)  =  f ^ (N^  -  x)  for  all  x  =  0,1,..., N.. 
Therefore,  it  is  reasonable  to  use 


(4.1) 


Un'-l 


fin<Nr*> 


2n  jl,  I{x,Nj-x}<xij) 
l  j,  ‘(x)  <Xij> 


N. 

for  x  /  t  . 


for  x  =  tt- 


to  estimate  f . (x) . 


For  W.(x),  x  =  0,1,...,N-  we  will  construct  a  sequence  of  consistent 
estimators  (wjn(x)},  in  terms  of  f 1 p (y ) ,  y =  0,1 , . . . ,N.  ,  by  using  the 
observations  (X.^,  j  =  l,...,n)  only.  The  following  lemma  is  very  helpful 


for  the  above  purpose. 


Lemma  4.1..  Suppose  that  the  prior  distribution  G,(*)  is  symmetric  about 


(a)  wi(x)  =  Wi(Nrx-^ 


for  each  x  =  0,1,...,N^-1. 


(b)  W^x)  +  W,. (N^-x)  =  f^(x)  =  f^ (N.-x)  for  each  x  =  0,1, ...,1^. 

(c)  Furthermore,  if  N.  is  an  even  Integer,  then,  W..  &>Uni) 


Proof:  Direct  computation. 

Theorem  4.1.  Suppose  that  ( * )  is  symmetric  about  p  *  1/2  and  1^  is 
an  even  integer.  Then,  for  each  x  =  0,1, ...,N. ,  W. (x)  can  be  represented 
as  a  linear  function  of  f.(y),  y  =  0,1 . N . . 


Proof:  It  follows  from  Lemma  4.1  that  for  each  x  =  0,1,., 
Ni 

z  =  x  -  ~2  +  1, 


. ,N.-1  and 


„(! i.t).  (!i. 

wi\  2  zj  N.+2z  Ti  \  2 


z  +  1 


(4.2) 


N.+2-2z  /N.  \ 

■  N’.+2z  wi  \T  "  z  +  7* 


Then,  by  (4.2),  Lemma  4.1  (b),  (c)  and  by  induction,  the  result  follows. 


By  Theorem  4.1,  for  each  x  =  0,1,...,N. , 


(4.3)  W.(x)  =  l  e(N.,x,y)  f,(y), 

1  y=0  1  1 


where  the  coefficients  8(N^,x,y)  depend  on  N.,  x  and  y.  Also,  the 
values  of  e(N^,x,y)  can  be  obtained  from  Lemma  4.1  (c)  and  the  iterative 
relation  (4.2). 
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We  then  define 


(4.4) 


where  f. 


(y)  is  defined  in  (4.1). 


Now,  define 


(4.5)  a]„()j)  " 


1 


P0f!n<xl)  '  W!n<xi> 


if  pQ  is  unknown, 
if  pQ  is  known, 


and 

(4.6)  <■!(*)-  (1l»jn<«>i0). 

From  (4.)),  it  is  clear  that  f]n(x)  ->  qlx)  with  probability  1  as 
n  -*•  *  for  each  x =  0,1 ,. . .  ,Ni .  Therefore,  from  (4.3)  and  (4.4), 
w]n(x)  -*■  Wi(x)  with  probability  1  as  n  -*•  »  for  each  x=  0,1,...,N-.  Thus 
we  have  the  following  theorem: 

Theorem  4,2.  Suppose  that  the  prior  distributions  G^(*)  are  symmetrical 
about  p  =  1/2  and  N.  are  even  integers  for  all  i  =  (0),l,...,k.  Then,  the 
sequence  of  decision  rules  {d^}”_^  is  asymptotically  optimal  relative  to 
the  prior  distribution  G. 

Rate  of  C onvergence  of  Empirical  Bayes  Rules  {d^} 

We  now  consider  the  rate  of  convergence  of  the  empirical  Bayes  rules 
id^}.  Following  the  same  discussion  as  given  in  (3.4)  through  (3.6),  and 


using  the  fact  that  fin(x)  -  f.(x)  with  probability  1,  it  suffices  to  consider 
the  behavior  of  P{«Jn(x)  -  W,(x)  >  J>  and  P{w]n(x)  -  U,(x)  <  -6}  as  „  ,  „ 
for  some  6  >  0,  for  each  x  =  0,1,. ...N.,  i  =  (0), 

From  (4.3)  and  (4.4),  for  each  x  =  0,1,...,N^, 

!N. 

Io0B(N.,x,y)  [fjn(y)-f.(y)]  >  6| 

I  l  P(6(Ni,x,y)  [f]n(y)-f.(y)]  > 


where  6 


1  *  Nj+T*  If  e^Ni>x»y)  =  0  for  some  0  ^  y  ^  ,  then 


pU(N1,x,y)[f1n(y)  -  f^y)]  >  )  =  0.  So,  we  assume  6(Ni,x,y)  t  0.  When 

6(Ni,x,y)  >  0,  then 

P{S(N1,x.y)[fJn(y)-fi(y)]  >  6i>  =  p{f|n(y)  .  fi(y)  >  6l/a(Nl,x,y)}. 
When  e(Nrx,y)  <  0,  then 

P(B(N.,x,y)[fjn(y)  -  f^ (y) ]  >  6-j }  =  P(f]n(y)  -  fi (y)  <  61/6(N.,x,y)}  . 


In  either  case,  the  problem  can  be  reduced  to  considering  the  convergence 
rate  of  PClf^y)  -  f  1  (y) j  >  6^  as  n  -*■  »  for  some  «2  >  0.  Similarly, 
for  the  convergence  rate  of  P{w]n(x)  -  U^x)  <  -5}  where  x=  0,1,. ..,N.  e 
>  0,  we  get  a  similar  result.  Therefore,  by  applying  Bernstein's 


inequality  and  following  an  argument  similar  to  that  of  (3.7),  we  conclude 


the  following  theorem: 

Theorem  4.3.  Let  {d^}^  be  the  sequence  of  decision  rules  defined  in  (4.6) 
Then,  id]}°°  ,  is  asymptotically  optimal  at  least  of  order  exp{-c^n}  relative 
to  the  prior  distribution  G  for  some  c^  >  0. 

5 .  Smoothed  Empirical  Bayes  Rules 

In  this  section,  we  again  assume  that  G^(*)  are  symmetrical  about 
p  =  1/2  and  Ni  are  even  integers  for  all  i  =  (0),  l,...,k.  In  Section  4, 
the  marginal  frequency  functions  f^(x),  x  =  0,1 , . . .  ,Ni ,  i  =  (0),  l,...,k,  ar 
estimated  in  terms  of  the  empirical  frequency  functions  f]n(x),  regardless 
of  the  properties  associated  with  the  marginal  function  f^(x).  In  this 
section,  by  considering  some  properties  related  to  f^x)  and  W^x),  two 
methods  for  obtaining  smooth  estimators  of  f^ (x)  and  W^x)  are  studied. 

We  first  state  the  following  lemma  (without  proof),  which  can  be 
verified  by  direct  computations . 

!.eu-:na  6.1.  Suppose  that  G.  (•)  is  symmetrical  about  p  =  1/2  and  is  an 
even  integer.  Then, 

N.  -1  N.  -1 

(a)  f^x)^1)  1  ^(yHy1)  for  0<y<x<N./2. 

N.  -1  N.  -1 

(b)  Wi(x)(x1)  _<  Wi(y)(^1)  for  0  _<  y  ^  x  <_  N^2  and  fT/2  <  x  <_  y  <_  N. 

(c)  Wi (y )  <  W.(N.-y)  for  0<y<Ni/2. 


Procedure  1 . 


For  each  0  <  y  <  N./2,  let 


(5.1) 


(y)  -  (y  ) 


max  min  {£  f|  (a)(  M 
y<x<N./2  0<z<x  a=z 


/ ( x-z+1 )  } , 


and  let  fin(N.-y)  =  fin(y).  Then,  let 


Ni 

(5.2) 

Win(y)  -  l  e(N.,y,z)f.n(z)  for  y  =  0,l,...,Nr 

Define 

p0^in(xi)-“in(xi)  if  p0  is  known’ 

(5.3) 

Nn(*>  "  j 

60n(x0)^n(xi,-“in(xi,Wx0)  if  P0  is  unknown- 

Finally,  define  the  selection  rule  dn  as  follows: 

(5.4)  dn(x)  -  {1|A1n(^)  <  0}. 

Asymptotic  Optimality  of  {dn} 

N  -1 

Note  that  f^n(y)(  )  ,  y  =  0,1 , . . .  ,N^  are  the  isotonic  estimators  of 

N,  -1  1  N.  -1 

f ^ (y ) (y  )  .  based  on  fin(x)(x  )  ,  x  =  0,1,. ...N^,  with  equal  weights.  Since 

f|n(x)  is  a  strongly  consistent  estimator  of  f^x)  for  all  x  =  0,1,...,^,  then, 
by  Theorem  2.2  of  Barlow  et  al  (1972),  Lemma  4.1(b),  (4.3)  and  the  definition  of 
Win(y).  it  is  not  hard  to  see  that  fin(y)  and  Wip(y)  are  strongly  consistent 
estimators  of  f..(y)  and  W..(y),  respectively. 

Next,  we  consider  the  rate  of  convergence  of  the  difference  r(G,dn)-r(G) . 
For  each  0  <  y  _<  N.  and  <s  >  0,  by  Theorem  2.1  of  Barlow  et  al  (1972),  we  can 
obtain  the  following  inequality. 


p;  |fin(y)  -  ^(y); 


(5.5) 


N.  N.  -1 


1  VL  P{ifin(x)'fi(x)l  '  OO  «(N1+1)-S>. 


Then,  with  a  discussion  similar  to  that  given  in  Section  4,  we  can 
conclude  that  r(G,dp)  -  r(G)  0(exp{-c^n})  for  some  c^  >  0. 

It  is  easy  to  see  that  the  new  estimators  f^n(y),  0  <  y  <  N.,  always 
satisfy  the  constraint  of  Lemma  5.1(a).  However,  one  would  also  like  to 
see  whether  the  estimators  W^p(y),  0  <  y  <  N • ,  satisfy  the  corresponding 
constraint  or  not.  The  following  lemma  is  useful  for  this  purpose. 

Lemma  5.2.  Let  U(x),  h(x)  be  nonnegative  functions  defined  on  {0,1,..., N 
where  N  is  an  even  positive  integer,  which  satisfy 

(a)  U(x)  =  ^  U(N-x-l)  for  all  x  =  0,1,...,N-1. 

(b)  U(x)  +  U(N-x)  =  h(x)  =  h(N-x)  for  all  x  =  0,1,. ...N  and 

(t)  !J(x)  U(N-x)  for  all  x  =  0,l,...,N/2. 


(d)  (x+l)h(x+l)  (N-x)h(x)  for  all  x  =  0,1 , . . . ,N/2-l . 

We  note  that  (a),  (b)  and  (d)  of  Lemma  5.2  do  not  imply  (c),  and 

the  estimators  W  in(y)>  0  <  y  <  do  not  always  satisfy  the  required 
constraint.  Lemma  5.2  suggests  resmoothing  based  on  W^p(y). 

Procedure  2.  Resmoothing  Based  on  p(y ) 

First,  let  Qip(N . )  =  w-jn(Ni )  and  for  each  N-j/2  i  y  1  1et 


(5.6) 


N.  , 

l  \  -1 


Q(„(y)  *  Cw1n(y)(  1/2. 
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Step  1 .  For  each  N./2  <_  y  _<  ,  let 


(5.7)  Q*(y)  =  max  min  {  \  Qin(a)/(z-x+l )} , 

N1/2<x<y  x<z<N.  a=x  in 


Step  2.  Let  W|n(Ni )  *  Q*n(N..)  and  for  each  N^/2  ±  N.-l,  let 

«n(y)  ■  Qt„(y)(%  and  ■  Q?„<y><N  "}.,>• 


Then,  let 


(5.8)  f*n(y)  =  w*n(y)  +  W|n(N.-y)  for  y  =  0,1,..., N.  and  define 


‘>Of*n(xi>-W?n<xi> 


if  pQ  is  known. 


(5.9)  6}„(x)  - 


“0„(x0)f*n<x1>-“in<xi>fyx0>  1f  <>0  is  unk"°“n' 


Finally,  define  the  selection  rule  d*  as  follows: 


(5.10) 


W  =  {i|Ain(«)  -  0K 


Remark.  By  Step  1  and  Step  2  of  Procedure  2,  the  estimators  WTn(y),  0  <  y  _<  N^. 
always  satisfy  the  constraint  of  Lemma  5.1(b)  and  (c).  Then,  by  Lemma  5.2, 
the  estimators  f*n(y),  0  <  y  <  N^also  satisfy  the  corresponding  constraint. 


'  '.  V 


Asymptotic  Optimality  of  {d* \ 

By  Theorem  2.2  of  Barlow  et  al.  (1972)  and  the  fact  that  Win(y),  0<y<Ni ,  are 
strongly  consistent  estimators  of  W^y),  0  <  y  _<  Ni ,  we  conclude  that  W*n(y), 

0  <  y  <  N.,  are  strongly  consistent  estimators  of  W..(y),  0  <  y  <  N..  Then,  by 
Lemma  4.1(b)  and  (5.8),  f*n(y),  0  <_  y  <_  N..,  are  also  consistent  estimators  of 
f.(y),  0  '  y  N. .  Therefore,  the  sequence  of  empirical  Bayes  selection  rules 
d*  is  asymptotically  optimal. 


"  •  *  *  •  *  *  ,  ’  1  ■  • 
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I'.y  fheorem  2.1  of  Harlow,  ot  al  .  (1972)  and  (5.8),  we  obtain,  for  ••  0, 

P!  |f*n(y)  -  Vy)!  '  M 

1  P{|Wfn(y)-W.(y)|  >  6/2}  +  P{|W*n(Ni-y)-Wi(Ni-y)|  >  6/2} 

^  i  N  N  N 

-P{  loiwyx)(xir1-wi(xMxi)_1i2  >  (y1)"^2/^ 

(5.11)  N 

1  Ni  1  Ni  1  ?  -2  2 

+  P{  I  iwtn(x)(x1)"1-w.(x)(x1)  Y  >  (N.ly)  *  /4} 

^i~  N .  N.  ,  o  ^i-22 

1  2  PI  IQ|Win(x)(x1)  -Wi(x)(x1)  ‘r  >  (y  )  *  /4} 

Ni  .  N  N  i 

i2  l  P(|Win(x)-W.(x)|  >  (x1)(y1)  6 (N .+1 )  */2}. 

Then  by  (4.3),  (5.2)  and  (5.5),  with  a  discussion  similar  to  that  given 
in  Theorem  4.3,  we  conclude  that  r(G,d*)-r(G)  £  0(exp{-c5n})  for  some  c&  >  0. 
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